Batch effect correction

In Section @ref(cell-quality) we observed staining/expression differences between the individual samples. This can arise due to technical (e.g., differences in sample processing) as well as biological (e.g. differential expression between patients/indications) reasons. However, the combination of these effects hinders cell phenotyping via clustering as highlighted in Section @ref(clustering).

To integrate cells across samples, we can use computational strategies developed for correcting batch effects in single-cell RNA sequencing data. In the following sections, we will use functions of the batchelor, harmony and Seurat packages to correct for such batch effects.

We will only test harmony, as the other two took several days!!

Of note: the correction approaches presented here aim at removing any differences between samples. This will also remove biological differences between the patients/indications. Nevertheless, integrating cells across samples can facilitate the detection of cell phenotypes via clustering.

First, we will read in the SpatialExperiment object containing the single-cell data.

When a code chunk is time-consuming to run, you may consider caching it via the chunk option cache = TRUE. When the cache is turned on, knitr will skip the execution of this code chunk if it has been executed before and nothing in the code chunk has changed since then. This is explained in more detail here. However cache=TRUE did not work and was computing batch correction anyway. Thus the chunk was commented.

path <- "/mnt/Multimodal_Imaging_Daria/_Data_Analysis/_tmp_daria/Image_analysis/20220723_Steinbock_Mesmer_IFbased"
spe <- readRDS(file.path(path,"data/spe.rds"))

harmony correction

The harmony algorithm performs batch correction by iteratively clustering and correcting the positions of cells in PCA space [@Korsunsky2019]. It requires a matrix of transformed expression counts and internally performs PCA before kmeans clustering. We will first create the expression matrix and call the HarmonyMatrix function to perform the correction.

Paper: Fast, sensitive and accurate integration of single-cell data with Harmony
Documentation: harmony

Similar to the fastMNN function, harmony returns the corrected low-dimensional coordinates for each cell. These can be saved in the reducedDim slot of the original SpatialExperiment object.

#library(harmony)

#mat <- t(assay(spe, "exprs")[rowData(spe)$use_channel,])

#harmony_emb <- HarmonyMatrix(mat, spe$sample, do_pca = TRUE)

#reducedDim(spe, "harmony") <- harmony_emb

The computational time of the HarmonyMatrix function call is 0 minutes.

Visualization

We will now again visualize the cells in low dimensions after UMAP embedding.

#set.seed(220228)
#detectCores(all.tests = FALSE, logical = TRUE) # find out how many logical cores available
#spe <- runUMAP(spe, dimred = "harmony", name = "UMAP_harmony", BPPARAM = MulticoreParam()) 
# visualize sample

library(cowplot)
library(dittoSeq)
library(viridis)
library(RColorBrewer)

n <- length(metadata(spe)$color_vectors$sample)
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))

metadata(spe)$color_vectors$sample <- sample(col_vector, n)

p1 <- dittoDimPlot(spe, var = "sample", 
                   reduction.use = "UMAP", size = 0.2, legend.size=2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$sample) +
    ggtitle("Sample ID on UMAP before correction") + 
    theme(legend.text=element_text(size=2))
p2 <- dittoDimPlot(spe, var = "sample", 
                   reduction.use = "UMAP_harmony", size = 0.2, legend.size=2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$sample) +
    ggtitle("Sample ID on UMAP after correction") + 
    theme(legend.text=element_text(size=2))

plot_grid(p1)

plot_grid(p2)

And we visualize selected marker expression as defined above.

# Before correction
plot_list <- multi_dittoDimPlot(spe, var = rownames(spe)[rowData(spe)$use_channel], reduction.use = "UMAP", 
                   assay = "exprs", size = 0.2, list.out = TRUE) 
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
plot_grid(plotlist = plot_list, nrow=13, ncol=3) 

# After correction
plot_list <- multi_dittoDimPlot(spe, var = rownames(spe)[rowData(spe)$use_channel], reduction.use = "UMAP_harmony", 
                   assay = "exprs", size = 0.2, ncol=3, nrow=13, list.out = TRUE) 
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
plot_grid(plotlist = plot_list, nrow=13, ncol=3) 

We observe a more aggressive merging of cells from different samples compared to the results after fastMNN correction. What we can see in both, corrected and uncorrected, is that T-cells appear in the similar region with expression of CD3, CD8, CD4, CD279 and Granzyme B. CD45 and CD15 are everywhere even in the tumor regions (GD2+, CD56+, GATA3+, ELAVL4+), which shouldn’t be the case. HLA-DR, CD14 and CD11c and CD274 come up in similar regions. CD24, CD10 regions could be immature neutrophils. In general, correction causes CD15, CD45 and HLA-ABC to be even more homogeneously expressed, while it improves the segregation of other cell types like tumor cells.

Seurat correction

Seurat and fastMNN were tested but required too much computing time (several days), and were stopped in the end.

Use plan(“multicore”) to parallelize Seurat and adjust future.globals.maxSize if required, as explained here.

Choosing the correct integration approach is challenging without having ground truth cell labels available. It is recommended to compare different techniques and different parameter settings. Please refer to the documentation of the individual tools to become familiar with the possible parameter choices. Furthermore, in the following section, we will discuss clustering and classification approaches in light of expression differences between samples.

In general, it appears that MNN-based approaches are less conservative in terms of merging compared to harmony. On the other hand, harmony could well merge cells in a way that regresses out biological signals.

Save objects

The modified SpatialExperiment object is saved for further downstream analysis.

saveRDS(spe, file.path(path, "data/spe.rds"))

Session Info

SessionInfo
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-3          viridis_0.6.2              
##  [3] viridisLite_0.4.0           dittoSeq_1.8.1             
##  [5] cowplot_1.1.1               scater_1.24.0              
##  [7] ggplot2_3.3.6               scuttle_1.6.2              
##  [9] imcRtools_1.3.5             SpatialExperiment_1.6.1    
## [11] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
## [13] Biobase_2.56.0              GenomicRanges_1.48.0       
## [15] GenomeInfoDb_1.32.3         IRanges_2.30.0             
## [17] S4Vectors_0.34.0            BiocGenerics_0.42.0        
## [19] MatrixGenerics_1.8.1        matrixStats_0.62.0         
## [21] BiocParallel_1.30.3        
## 
## loaded via a namespace (and not attached):
##   [1] systemfonts_1.0.4         plyr_1.8.7               
##   [3] igraph_1.3.4              sp_1.5-0                 
##   [5] shinydashboard_0.7.2      digest_0.6.29            
##   [7] htmltools_0.5.3           magick_2.7.3             
##   [9] tiff_0.1-11               fansi_1.0.3              
##  [11] magrittr_2.0.3            ScaledMatrix_1.4.0       
##  [13] tzdb_0.3.0                limma_3.52.2             
##  [15] readr_2.1.2               graphlayouts_0.8.0       
##  [17] svgPanZoom_0.3.4          vroom_1.5.7              
##  [19] R.utils_2.12.0            svglite_2.1.0            
##  [21] jpeg_0.1-9                colorspace_2.0-3         
##  [23] ggrepel_0.9.1             xfun_0.32                
##  [25] dplyr_1.0.9               crayon_1.5.1             
##  [27] RCurl_1.98-1.8            jsonlite_1.8.0           
##  [29] glue_1.6.2                polyclip_1.10-0          
##  [31] gtable_0.3.0              nnls_1.4                 
##  [33] zlibbioc_1.42.0           XVector_0.36.0           
##  [35] DelayedArray_0.22.0       BiocSingular_1.12.0      
##  [37] DropletUtils_1.16.0       Rhdf5lib_1.18.2          
##  [39] HDF5Array_1.24.2          abind_1.4-5              
##  [41] scales_1.2.0              pheatmap_1.0.12          
##  [43] DBI_1.1.3                 edgeR_3.38.4             
##  [45] Rcpp_1.0.9                xtable_1.8-4             
##  [47] units_0.8-0               dqrng_0.3.0              
##  [49] rsvd_1.0.5                bit_4.0.4                
##  [51] proxy_0.4-27              DT_0.24                  
##  [53] htmlwidgets_1.5.4         ellipsis_0.3.2           
##  [55] pkgconfig_2.0.3           R.methodsS3_1.8.2        
##  [57] farver_2.1.1              sass_0.4.2               
##  [59] locfit_1.5-9.6            utf8_1.2.2               
##  [61] labeling_0.4.2            tidyselect_1.1.2         
##  [63] rlang_1.0.4               RTriangle_1.6-0.10       
##  [65] later_1.3.0               munsell_0.5.0            
##  [67] tools_4.2.0               cachem_1.0.6             
##  [69] cli_3.3.0                 generics_0.1.3           
##  [71] ggridges_0.5.3            evaluate_0.16            
##  [73] stringr_1.4.0             fastmap_1.1.0            
##  [75] fftwtools_0.9-11          yaml_2.3.5               
##  [77] bit64_4.0.5               knitr_1.39               
##  [79] tidygraph_1.2.1           purrr_0.3.4              
##  [81] ggraph_2.0.6              sparseMatrixStats_1.8.0  
##  [83] mime_0.12                 R.oo_1.25.0              
##  [85] concaveman_1.1.0          compiler_4.2.0           
##  [87] rstudioapi_0.13           beeswarm_0.4.0           
##  [89] png_0.1-7                 e1071_1.7-11             
##  [91] tibble_3.1.8              tweenr_1.0.2             
##  [93] bslib_0.4.0               stringi_1.7.8            
##  [95] highr_0.9                 cytomapper_1.9.1         
##  [97] lattice_0.20-45           Matrix_1.4-1             
##  [99] classInt_0.4-7            vctrs_0.4.1              
## [101] pillar_1.8.0              lifecycle_1.0.1          
## [103] rhdf5filters_1.8.0        jquerylib_0.1.4          
## [105] BiocNeighbors_1.14.0      irlba_2.3.5              
## [107] data.table_1.14.2         bitops_1.0-7             
## [109] raster_3.5-21             httpuv_1.6.5             
## [111] R6_2.5.1                  promises_1.2.0.1         
## [113] KernSmooth_2.23-20        gridExtra_2.3            
## [115] vipor_0.4.5               codetools_0.2-18         
## [117] MASS_7.3-56               assertthat_0.2.1         
## [119] rhdf5_2.40.0              rjson_0.2.21             
## [121] withr_2.5.0               GenomeInfoDbData_1.2.8   
## [123] parallel_4.2.0            hms_1.1.1                
## [125] EBImage_4.38.0            terra_1.6-7              
## [127] grid_4.2.0                beachmat_2.12.0          
## [129] class_7.3-20              tidyr_1.2.0              
## [131] rmarkdown_2.14            DelayedMatrixStats_1.18.0
## [133] sf_1.0-8                  ggforce_0.3.3            
## [135] shiny_1.7.2               ggbeeswarm_0.6.0